Goto

Collaborating Authors

 quadratic loss


ClassSuperstat

KCL

Neural Information Processing Systems

In this Appendix, we will derive the fixed-point equations for the order parameters presented in the main text, following and generalising the analysis in Ref. [ Saddle-point equations The saddle-point equations are derived straightforwardly from the obtained free energy functionally extremising with respect to all parameters. The zero-regularisation limit of the logistic loss can help us study the separability transition. N 5 + \ 1 p 0, 1 d 5. (66) As a result, given that \ 2( 0, 1 ], the smaller value for which E is finite is U This result has been generalised immediately afterwards by Pesce et al. Ref. [ 59 ] for the Gaussian case, we can obtain the following fixed-point equations, 8 > > > > > >< > > > > > >: E = Mean universality Following Ref. [ In our case, this condition is simpler than in Ref. [ We see that mean-independence in this setting is indeed verified. Numerical experiments Numerical experiments regarding the quadratic loss with ridge regularisation were performed by computing the Moore-Penrose pseudoinverse solution.


45d74e190008c7bff2845ffc8e3facd3-Supplemental-Conference.pdf

Neural Information Processing Systems

In a typical supervised learning task, one is given a training dataset ofn N labeled samplesD = ((xi,yi) Rd R)i [n], and a parametric model withm N parameters, f:Rm Rd R. The task istofind parameters fitting the training data, i.e. findθ Rm such that i [n],f(θ;xi) yi.





Are Hallucinations Bad Estimations?

Liu, Hude, Hu, Jerry Yao-Chieh, Zhang, Jennifer Yuntong, Song, Zhao, Liu, Han

arXiv.org Machine Learning

We formalize hallucinations in generative models as failures to link an estimate to any plausible cause. Under this interpretation, we show that even loss-minimizing optimal estimators still hallucinate. We confirm this with a general high probability lower bound on hallucinate rate for generic data distributions. This reframes hallucination as structural misalignment between loss minimization and human-acceptable outputs, and hence estimation errors induced by miscalibration. Experiments on coin aggregation, open-ended QA, and text-to-image support our theory.


A Proof of Theorem

Neural Information Processing Systems

Proposition 2. Using the same notations as in Proposition 1, we have the following results. Algorithm 2 gives pseudocode for finding the optimal split for a given feature. Output: Split (f, t) that gives the largest risk reduction. Proposition 5. F or the sigmoid loss, we have null R Proposition 4. If a node contains the examples Output: Collection of trained decision trees. Algorithm 5: Find_Split(κ, F, T) Input: κ - node; F - number of attributes; T - number of threshold values per attribute.


Variational Learning Finds Flatter Solutions at the Edge of Stability

Ghosh, Avrajit, Cong, Bai, Yokota, Rio, Ravishankar, Saiprasad, Wang, Rongrong, Tao, Molei, Khan, Mohammad Emtiyaz, Möllenhoff, Thomas

arXiv.org Machine Learning

Variational Learning (VL) has recently gained popularity for training deep neural networks and is competitive to standard learning methods. Part of its empirical success can be explained by theories such as PAC-Bayes bounds, minimum description length and marginal likelihood, but there are few tools to unravel the implicit regularization in play. Here, we analyze the implicit regularization of VL through the Edge of Stability (EoS) framework. EoS has previously been used to show that gradient descent can find flat solutions and we extend this result to VL to show that it can find even flatter solutions. This is obtained by controlling the posterior covariance and the number of Monte Carlo samples from the posterior. These results are derived in a similar fashion as the standard EoS literature for deep learning, by first deriving a result for a quadratic problem and then extending it to deep neural networks. We empirically validate these findings on a wide variety of large networks, such as ResNet and ViT, to find that the theoretical results closely match the empirical ones. Ours is the first work to analyze the EoS dynamics in VL.


Deep Learning for VWAP Execution in Crypto Markets: Beyond the Volume Curve

Genet, Remi

arXiv.org Artificial Intelligence

Volume-Weighted Average Price (VWAP) is arguably the most prevalent benchmark for trade execution as it provides an unbiased standard for comparing performance across market participants. However, achieving VWAP is inherently challenging due to its dependence on two dynamic factors, volumes and prices. Traditional approaches typically focus on forecasting the market's volume curve, an assumption that may hold true under steady conditions but becomes suboptimal in more volatile environments or markets such as cryptocurrency where prediction error margins are higher. In this study, I propose a deep learning framework that directly optimizes the VWAP execution objective by bypassing the intermediate step of volume curve prediction. Leveraging automatic differentiation and custom loss functions, my method calibrates order allocation to minimize VWAP slippage, thereby fully addressing the complexities of the execution problem. My results demonstrate that this direct optimization approach consistently achieves lower VWAP slippage compared to conventional methods, even when utilizing a naive linear model presented in arXiv:2410.21448. They validate the observation that strategies optimized for VWAP performance tend to diverge from accurate volume curve predictions and thus underscore the advantage of directly modeling the execution objective. This research contributes a more efficient and robust framework for VWAP execution in volatile markets, illustrating the potential of deep learning in complex financial systems where direct objective optimization is crucial. Although my empirical analysis focuses on cryptocurrency markets, the underlying principles of the framework are readily applicable to other asset classes such as equities.


Optimal Dynamic Regret in LQR Control

Neural Information Processing Systems

We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control. The rate improves the best known rate of \tilde{O}(\sqrt{n (\mathcal{TV}(M_{1:n}) 1)}) for general convex losses and is information-theoretically optimal for LQR. Main technical components include the reduction of LQR to online linear regression with delayed feedback due to Foster & Simchowitz 2020, as well as a new \emph{proper} learning algorithm with an optimal \tilde{O}(n {1/3}) dynamic regret on a family of "minibatched'' quadratic losses, which could be of independent interest.